Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels
نویسنده
چکیده
Word sense disambiguation aims to identify which meaning of a word is present in a given usage. Gathering word sense annotations is a laborious and difficult task. Several methods have been proposed to gather sense annotations using large numbers of untrained annotators, with mixed results. We propose three new annotation methodologies for gathering word senses where untrained annotators are allowed to use multiple labels and weight the senses. Our findings show that given the appropriate annotation task, untrained workers can obtain at least as high agreement as annotators in a controlled setting, and in aggregate generate equally as good of a sense labeling.
منابع مشابه
The Benefits of a Model of Annotation
This paper presents a case study of a difficult and important categorical annotation task (word sense) to demonstrate a probabilistic annotation model applied to crowdsourced data. It is argued that standard (chance-adjusted) agreement levels are neither necessary nor sufficient to ensure high quality gold standard labels. Compared to conventional agreement measures, application of an annotatio...
متن کاملCrowdsourcing Experiment Designs for Chinese Word Sense Annotation
This paper tries to demonstrate our exploratory efforts in tackling with the “high accuracy-low quantity” problem of human word sense annotation task in Chinese, and ultimately reach the goal of automatic word sense annotation. Our proposed annotation architecture consists of explicit and implicit aspects of of crowdsourcing approach. Explicit method focuses on the general issues of crowdsourci...
متن کاملCrowdsourcing Word Sense Definition
In this paper, we propose a crowdsourcing methodology for a single-step construction of both an empirically-derived sense inventory and the corresponding sense-annotated corpus. The methodology taps the intuitions of non-expert native speakers to create an expertquality resource, and natively lends itself to supplementing such a resource with additional information about the structure and relia...
متن کاملWhen Is Word Sense Disambiguation Difficult? A Crowdsourcing Approach
We identified features that drive differential accuracy in word sense disambiguation (WSD) by building regression models using 10,000 coarse-grained WSD instances which were labeled on Mturk. Features predictive of accuracy include properties of the target word (word frequency, part of speech, and number of possible senses), the example context (length), and the Turker’s engagement with our tas...
متن کاملOn the difficulty of making concreteness concrete
The use of labels of semantic properties like ‘concreteness’ is quite common in studies in syntax, but their exact meaning is often unclear. In this article, we compare different definitions of concreteness, and use them in different implementations to annotate nouns in two data sets: (1) all nouns with word sense annotations in the SemCor corpus, and (2) nouns in a particular lexico-syntactic ...
متن کامل